Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments

نویسندگان

  • Gábor Bartók
  • Dávid Pál
  • Csaba Szepesvári
چکیده

In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary and unknown probability distribution, we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero, Θ̃( √ T ), Θ(T ), or Θ(T ). We provide a computationally efficient learning algorithm that achieves the minimax regret within logarithmic factor for any game.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward a Classification of Finite Partial-Monitoring Games

In a finite partial-monitoring game against Nature, the Learner repeatedly chooses one of finitely many actions, the Nature responds with one of finitely many outcomes, the Learner suffers a loss and receives feedback signal, both of which are fixed functions of the action and the outcome. The goal of the Learner is to minimize its total cumulative loss. We make progress towards classification ...

متن کامل

An adaptive algorithm for finite stochastic partial monitoring

We present a new anytime algorithm that achieves near-optimal regret for any instance of finite stochastic partial monitoring. In particular, the new algorithm achieves the minimax regret, within logarithmic factors, for both “easy” and “hard” problems. For easy problems, it additionally achieves logarithmic individual regret. Most importantly, the algorithm is adaptive in the sense that if the...

متن کامل

No Internal Regret via Neighborhood Watch

We present an algorithm which attains O( p T ) internal (and thus external) regret for finite games with partial monitoring under the local observability condition. Recently, this condition has been shown by Bartók, Pál, and Szepesvári [4] to imply the O( p T ) rate for partial monitoring games against an i.i.d. opponent, and the authors conjectured that the same holds for non-stochastic advers...

متن کامل

Non-trivial two-armed partial-monitoring games are bandits

We consider online learning in partial-monitoring games against an oblivious adversary. We show that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is Θ( √ T ).

متن کامل

Regret Bounds and Minimax Policies under Partial Monitoring

This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudoregret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function ψ for which we propose a u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011